Automatic topic identification of health-related messages in online health community using text classification
نویسنده
چکیده
To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
منابع مشابه
Health-Related Hot Topic Detection in Online Communities Using Text Clustering
Recently, health-related social media services, especially online health communities, have rapidly emerged. Patients with various health conditions participate in online health communities to share their experiences and exchange healthcare knowledge. Exploring hot topics in online health communities helps us better understand patients' needs and interest in health-related knowledge. However, th...
متن کاملAutomatic Identification of Messages Related to Adverse Drug Reactions from Online User Reviews using Feature-based Classification
BACKGROUND User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. METHODS We conducted experiments on online user reviews using di...
متن کاملClassification of Virtual Investing-Related Community Postings
The rapid growth of online investing and virtual investing-related communities (VICs) has a wide-raging impact on research, practice and policy. Given the enormous volume of postings on VICs, automated classification of messages to extract relevance is critical. Classification is complicated by three factors: (a) the amount of irrelevant messages or "noise" messages (e.g., spam, insults), (b) t...
متن کاملCombining Text Mining and Data Visualization Techniques to Understand Consumer Experiences of Electronic Cigarettes and Hookah in Online Forums
Introduction Since their introduction to the US market in 2007, electronic cigarettes (e-cigarettes) have posed considerable challenges to both public health authorities and government regulators, especially given the debate – in both the scientific world and the community at large – regarding the potential advantages (e.g. helping individuals quit smoking) and disadvantages (e.g. renormalizing...
متن کاملInvestigating the Effect of Educational Text Messages on Self-Care in Hypertensive Patients in a Hypertension Clinic in Kerman
Introduction: Self-care in chronic diseases implies the study and control of disease symptoms, maintaining a healthy lifestyle, and daily functioning. The objective of this study was to investigate the effect of educational text messages on self-care in hypertensive patients in one of the hypertension clinics in Kerman in 2020. Method: The statistical population of this quasi-experimental study...
متن کامل